13 research outputs found
How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies
Autonomous driving has the potential to revolutionize mobility and is hence
an active area of research. In practice, the behavior of autonomous vehicles
must be acceptable, i.e., efficient, safe, and interpretable. While vanilla
reinforcement learning (RL) finds performant behavioral strategies, they are
often unsafe and uninterpretable. Safety is introduced through Safe RL
approaches, but they still mostly remain uninterpretable as the learned
behaviour is jointly optimized for safety and performance without modeling them
separately. Interpretable machine learning is rarely applied to RL. This paper
proposes SafeDQN, which allows to make the behavior of autonomous vehicles safe
and interpretable while still being efficient. SafeDQN offers an
understandable, semantic trade-off between the expected risk and the utility of
actions while being algorithmically transparent. We show that SafeDQN finds
interpretable and safe driving policies for a variety of scenarios and
demonstrate how state-of-the-art saliency techniques can help to assess both
risk and utility.Comment: 8 pages, 5 figure
Quantum Policy Gradient Algorithm with Optimized Action Decoding
Quantum machine learning implemented by variational quantum circuits (VQCs)
is considered a promising concept for the noisy intermediate-scale quantum
computing era. Focusing on applications in quantum reinforcement learning, we
propose a specific action decoding procedure for a quantum policy gradient
approach. We introduce a novel quality measure that enables us to optimize the
classical post-processing required for action selection, inspired by local and
global quantum measurements. The resulting algorithm demonstrates a significant
performance improvement in several benchmark environments. With this technique,
we successfully execute a full training routine on a 5-qubit hardware device.
Our method introduces only negligible classical overhead and has the potential
to improve VQC-based algorithms beyond the field of quantum reinforcement
learning.Comment: Accepted to the 40th International Conference on Machine Learning
(ICML 2023), Honolulu, Hawaii, USA. 22 pages, 10 figures, 3 table
Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning
Reinforcement learning is a growing field in AI with a lot of potential.
Intelligent behavior is learned automatically through trial and error in
interaction with the environment. However, this learning process is often
costly. Using variational quantum circuits as function approximators can reduce
this cost. In order to implement this, we propose the quantum natural policy
gradient (QNPG) algorithm -- a second-order gradient-based routine that takes
advantage of an efficient approximation of the quantum Fisher information
matrix. We experimentally demonstrate that QNPG outperforms first-order based
training on Contextual Bandits environments regarding convergence speed and
stability and thereby reduces the sample complexity. Furthermore, we provide
evidence for the practical feasibility of our approach by training on a
12-qubit hardware device.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. 7 pages, 5 figures, 1 tabl
Cutting multi-control quantum gates with ZX calculus
Circuit cutting, the decomposition of a quantum circuit into independent
partitions, has become a promising avenue towards experiments with larger
quantum circuits in the noisy-intermediate scale quantum (NISQ) era. While
previous work focused on cutting qubit wires or two-qubit gates, in this work
we introduce a method for cutting multi-controlled Z gates. We construct a
decomposition and prove the upper bound on the associated
sampling overhead, where is the number of cuts in the circuit. This bound
is independent of the number of control qubits but can be further reduced to
for the special case of CCZ gates. Furthermore, we
evaluate our proposal on IBM hardware and experimentally show noise resilience
due to the strong reduction of CNOT gates in the cut circuits
A Survey on Quantum Reinforcement Learning
Quantum reinforcement learning is an emerging field at the intersection of
quantum computing and machine learning. While we intend to provide a broad
overview of the literature on quantum reinforcement learning (our
interpretation of this term will be clarified below), we put particular
emphasis on recent developments. With a focus on already available noisy
intermediate-scale quantum devices, these include variational quantum circuits
acting as function approximators in an otherwise classical reinforcement
learning setting. In addition, we survey quantum reinforcement learning
algorithms based on future fault-tolerant hardware, some of which come with a
provable quantum advantage. We provide both a birds-eye-view of the field, as
well as summaries and reviews for selected parts of the literature.Comment: 62 pages, 16 figure
Uncovering Instabilities in Variational-Quantum Deep Q-Networks
Deep Reinforcement Learning (RL) has considerably advanced over the past
decade. At the same time, state-of-the-art RL algorithms require a large
computational budget in terms of training time to converge. Recent work has
started to approach this problem through the lens of quantum computing, which
promises theoretical speed-ups for several traditionally hard tasks. In this
work, we examine a class of hybrid quantum-classical RL algorithms that we
collectively refer to as variational quantum deep Q-networks (VQ-DQN). We show
that VQ-DQN approaches are subject to instabilities that cause the learned
policy to diverge, study the extent to which this afflicts reproduciblity of
established results based on classical simulation, and perform systematic
experiments to identify potential explanations for the observed instabilities.
Additionally, and in contrast to most existing work on quantum reinforcement
learning, we execute RL algorithms on an actual quantum processing unit (an IBM
Quantum Device) and investigate differences in behaviour between simulated and
physical quantum systems that suffer from implementation deficiencies. Our
experiments show that, contrary to opposite claims in the literature, it cannot
be conclusively decided if known quantum approaches, even if simulated without
physical imperfections, can provide an advantage as compared to classical
approaches. Finally, we provide a robust, universal and well-tested
implementation of VQ-DQN as a reproducible testbed for future experiments.Comment: Authors Maja Franz, Lucas Wolf, Maniraman Periyasamy contributed
equally (name order randomised). To be published in the Journal of The
Franklin Institut
Enhanced Immersion for Binaural Audio Reproduction of Ambisonics in Six-Degrees-of-Freedom: The Effect of Added Distance Information
The immersion of the user is of key interest in the reproduction of acoustic scenes in virtual reality. It is enhanced when movement is possible in six degrees-of-freedom, i.e., three rotational plus three translational degrees. Further enhancement of immersion can be achieved when the user is not only able to move between distant sound sources, but can also move towards and behind close sources. In this paper, we employ a reproduction method for Ambisonics recordings from a single position that uses meta information on the distance of the sound sources in the recorded acoustic scene. A subjective study investigates the benefit of said distance information. Different spatial audio reproduction methods are compared with a multi-stimulus test. Two synthetic scenes are contrasted, one with close sources the user can walk around, and one with far away sources that can not be reached. We found that for close or distant sources, loudness changing with the distance enhances the experience. In case of close sources, the use of correct distance information was found to be important
Listening Tests with Individual versus Generic Head-Related Transfer Functions in Six-Degrees-of-Freedom Virtual Reality
Individual head-related transfer functions (HRTFs) improve localization accuracy and externalization in binaural audio reproduction compared to generic HRTFs. Listening tests are often conducted using generic HRTFs due to the difficulty of obtaining individual HRTFs for all participants. This study explores the ramifications of the choice of HRTFs for critical listening in a six-degrees-of-freedom audio-visual virtual environment, when participants are presented with an overall audio quality evaluation task. The study consists of two sessions using either individual or generic HRTFs. A small effect between the sessions is observed in a condition where elevation cues are impaired. Other conditions are rated similarly between individual and generic HRTFs
Evaluation of binaural reproduction systems from behavioral patterns in a six-degrees-of-freedom wayfinding task
This paper proposes a new method for evaluating real-time binaural reproduction systems by means of a wayfinding task in six degrees of freedom. Participants physically walk to sound objects in a virtual reality created by a head-mounted display and binaural audio. We show how the localization accuracy of spatial audio rendering is reflected by objective measures of the participants' behavior. The method allows for comparative evaluation of different rendering systems as well as the subjective assessment of the quality of experience